The Art of SEO : Controlling Content with Cookies and Session IDs

12/19/2010 3:47:23 PM

Sometimes you want to more carefully dictate what a search engine robot sees when it visits your site. In general, search engine representatives will refer to the practice of showing different content to users than crawlers as cloaking, which violates the engines’ Terms of Service (TOS) and is considered spam.

However, there are legitimate uses for this concept that are not deceptive to the search engines or malicious in intent. This section will explore methods for doing this with cookies and sessions IDs.

1. What’s a Cookie?

A cookie is a small text file that websites can leave on a visitor’s hard disk, helping them to track that person over time. Cookies are the reason Amazon.com remembers your username between visits and the reason you don’t necessarily need to log in to your Hotmail account every time you open your browser. Cookie data typically contains a short set of information regarding when you last accessed a site, an ID number, and, potentially, information about your visit (see Figure 1).

Figure 1. Using cookies to store data

Website developers can create options to remember visitors using cookies for tracking purposes or to display different information to users based on their actions or preferences. Common uses include remembering a username, maintaining a shopping cart, and keeping track of previously viewed content. For example, if you’ve signed up for an account with SEOmoz, it will provide you with options on your My Account page about how you want to view the blog and will remember that the next time you visit.

2. What Are Session IDs?

Session IDs are virtually identical to cookies in functionality, with one big difference. Upon closing your browser (or restarting), session ID information is no longer stored on your hard drive (usually); see Figure 2 . The website you were interacting with may remember your data or actions, but it cannot retrieve session IDs from your machine that don’t persist (and session IDs by default expire when the browser shuts down). In essence, session IDs are more like temporary cookies (although, as you’ll see shortly, there are options to control this).

Figure 2. How session IDs are used

Although technically speaking, session IDs are just a form of cookie without an expiration date, it is possible to set session IDs with expiration dates similar to cookies (going out decades). In this sense, they are virtually identical to cookies. Session IDs do come with an important caveat, though: they are frequently passed in the URL string, which can create serious problems for search engines (as every request produces a unique URL with duplicate content). A simple fix is to use the canonical tag to tell the search engines that you want them to ignore the session IDs.

Note:

Any user has the ability to turn off cookies in his browser settings. This often makes web browsing considerably more difficult, and many sites will actually display a page saying that cookies are required to view or interact with their content. Cookies, persistent though they may be, are also deleted by users on a semiregular basis. For example, a 2007 comScore study found that 33% of web users deleted their cookies at least once per month.

3. How Do Search Engines Interpret Cookies and Session IDs?

They don’t. Search engine spiders are not built to maintain or retain cookies or session IDs and act as browsers with this functionality shut off. However, unlike visitors whose browsers won’t accept cookies, the crawlers can sometimes reach sequestered content by virtue of webmasters who want to specifically let them through. Many sites have pages that require cookies or sessions to be enabled but have special rules for search engine bots, permitting them to access the content as well. Although this is technically cloaking, there is a form of this known as First Click Free that search engines generally allow .

Despite the occasional access engines are granted to cookie/session-restricted pages, the vast majority of cookie and session ID usage creates content, links, and pages that limit access. Web developers can leverage the power of concepts such as First Click Free to build more intelligent sites and pages that function in optimal ways for both humans and engines.

4. Why Would You Want to Use Cookies or Session IDs to Control Search Engine Access?

There are numerous potential tactics to leverage cookies and session IDs for search engine control. Here are many of the major strategies you can implement with these tools, but there are certainly limitless other possibilities:

Showing multiple navigation paths while controlling the flow of link juice

Visitors to a website often have multiple ways in which they’d like to view or access content. Your site may benefit from offering many paths to reaching content (by date, topic, tag, relationship, ratings, etc.), but expends PageRank or link juice that would be better optimized by focusing on a single, search-engine-friendly navigational structure. This is important because these varied sort orders may be seen as duplicate content.

You can require a cookie for users to access the alternative sort order versions of a page, and prevent the search engine from indexing multiple pages with the same content. One alternative solution to this is to use the canonical tag to tell the search engine that these alternative sort orders are really just the same content as the original page .

Keep limited pieces of a page’s content out of the engines’ indexes

Many pages may contain content that you’d like to show to search engines and pieces you’d prefer appear only for human visitors. These could include ads, login-restricted information, links, or even rich media. Once again, showing noncookied users the plain version and cookie-accepting visitors the extended information can be invaluable. Note that this is often used in conjunction with a login, so only registered users can access the full content (such as on sites like Facebook and LinkedIn). For Yahoo! you can also use the robots-nocontent tag that allows you to specify portions of your page that Yahoo! should ignore (Google and Bing do not support the tag).

Grant access to pages requiring a login

As with snippets of content, there are often entire pages or sections of a site on which you’d like to restrict search engine access. This can be easy to accomplish with cookies/sessions, and it can even help to bring in search traffic that may convert to “registered-user” status. For example, if you had desirable content that you wished to restrict, you could create a page with a short snippet and an offer to continue reading upon registration, which would then allow access to that work at the same URL.

Avoid duplicate content issues

One of the most promising areas for cookie/session use is to prohibit spiders from reaching multiple versions of the same content, while allowing visitors to get the version they prefer. As an example, at SEOmoz, logged-in users can see full blog entries on the blog home page, but search engines and nonregistered users will see only the snippets. This prevents the content from being listed on multiple pages (the blog home page and the specific post pages), and provides a positive user experience for members.